Goto

Collaborating Authors

 knn classifier




On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Neural Information Processing Systems

The k-Nearest Neighbors (kNN) classifier is a fundamental non-parametric machine learning algorithm. However, it is well known that it suffers from the curse of dimensionality, which is why in practice one often applies a kNN classifier on top of a (pre-trained) feature transformation. From a theoretical perspective, most, if not all theoretical results aimed at understanding the kNN classifier are derived for the raw feature space. This leads to an emerging gap between our theoretical understanding of kNN and its practical applications. In this paper, we take a first step towards bridging this gap.


On pattern classification with weighted dimensions

Mollah, Ayatullah Faruk

arXiv.org Artificial Intelligence

Studies on various facets of pattern classification is often imperative while working with multi - dimensional samples pertaining to diverse application scenarios. In this notion, w eighted dimension - based distance measure has been one of the vital considerat ions in pattern analysis as it reflects the degree of similarity between samples . Though it is often presumed to be settled with the pervasive use of Euclidean distance, plethora of issues often surface. In this paper, we present (a) a detail analysis on t he impact of distance measure norms and weights of dimensions along with visualization, (b) a novel weighting scheme for each dimension, (c) incorporation of this dimensional weighting schema in to a KNN classifier, and (d) pattern classification on a varie ty of synthetic as well as realistic datasets with the developed model . It has perform ed well across diverse experiments in comparison to the traditional KNN under the same experimental setups. Specifically, for gene expression datasets, it yields signific ant and consistent gain in classification accuracy (around 10%) in all cross - validation experiments with different values of k. As such datasets contain limited number of samples of high dimensions, meaningful selection of nearest neighbours is desirable, and this requirement is reasonably met by regulat ing the shape and size of the region enclos ing the k number of reference samples with the developed weighting schema and appropriate norm . I t, therefore, stands as an important generalization of K NN classifier powered by weighted Minkowski distance with the present weighting schema .




Detection of Adulteration in Coconut Milk using Infrared Spectroscopy and Machine Learning

Al-Awadhi, Mokhtar A., Deshmukh, Ratnadeep R.

arXiv.org Artificial Intelligence

In this paper, we propose a system for detecting adulteration in coconut milk, utilizing infrared spectroscopy. The machine learning-based proposed system comprises three phases: preprocessing, feature extraction, and classification. The first phase involves removing irrelevant data from coconut milk spectral signals. In the second phase, we employ the Linear Discriminant Analysis (LDA) algorithm for extracting the most discriminating features. In the third phase, we use the K-Nearest Neighbor (KNN) model to classify coconut milk samples into authentic or adulterated. We evaluate the performance of the proposed system using a public dataset comprising Fourier Transform Infrared (FTIR) spectral information of pure and contaminated coconut milk samples. Findings show that the proposed method successfully detects adulteration with a cross-validation accuracy of 93.33%.


A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

Luo, Yufeng, Myers, Adam D., Drlica-Wagner, Alex, Dematties, Dario, Borchani, Salma, Valdes, Frank, Dey, Arjun, Schlegel, David, Zhou, Rongpu, Team, DESI Legacy Imaging Surveys

arXiv.org Artificial Intelligence

As the data volume of astronomical imaging surveys rapidly increases, traditional methods for image anomaly detection, such as visual inspection by human experts, are becoming impractical. We introduce a machine-learning-based approach to detect poor-quality exposures in large imaging surveys, with a focus on the DECam Legacy Survey (DECaLS) in regions of low extinction (i.e., $E(B-V)<0.04$). Our semi-supervised pipeline integrates a vision transformer (ViT), trained via self-supervised learning (SSL), with a k-Nearest Neighbor (kNN) classifier. We train and validate our pipeline using a small set of labeled exposures observed by surveys with the Dark Energy Camera (DECam). A clustering-space analysis of where our pipeline places images labeled in ``good'' and ``bad'' categories suggests that our approach can efficiently and accurately determine the quality of exposures. Applied to new imaging being reduced for DECaLS Data Release 11, our pipeline identifies 780 problematic exposures, which we subsequently verify through visual inspection. Being highly efficient and adaptable, our method offers a scalable solution for quality control in other large imaging surveys.


Discriminative Metric Learning by Neighborhood Gerrymandering

Shubhendu Trivedi, David Mcallester, Greg Shakhnarovich

Neural Information Processing Systems

We formulate the problem of metric learning for k nearest neighbor classification as a large margin structured prediction problem, with a latent variable representing the choice of neighbors and the task loss directly corresponding to classification error. We describe an efficient algorithm for exact loss augmented inference, and a fast gradient descent algorithm for learning in this model. The objective drives the metric to establish neighborhood boundaries that benefit the true class labels for the training points. Our approach, reminiscent of gerrymandering (redrawing of political boundaries to provide advantage to certain parties), is more direct in its handling of optimizing classification accuracy than those previously proposed. In experiments on a variety of data sets our method is shown to achieve excellent results compared to current state of the art in metric learning.


Review for NeurIPS paper: Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms

Neural Information Processing Systems

Additional Feedback: Overall this paper is well presented and technically sound. However, I believe its technical contribution is minor and it does not have significant impact to this field. Thus I vote for a weak reject. To increase the contribution of this paper, the authors can consider designing training algorithms that improves the provable robustness of NPCs. For example, RSLVQ is a strong method (in Table 1 it achieves very competitive clean test error); can we improve its robustness to the same level of other baselines?